A corpus for OCR research on mathematical expressions
Identifieur interne : 001424 ( Main/Exploration ); précédent : 001423; suivant : 001425A corpus for OCR research on mathematical expressions
Auteurs : Utpal Garain [Inde] ; Bidyut Baran Chaudhuri [Inde]Source :
- International journal on document analysis and recognition : (Print) [ 1433-2833 ] ; 2005.
Descripteurs français
- Pascal (Inist)
- Wicri :
- topic : Base de données.
English descriptors
- KwdEn :
Abstract
This paper is concerned with research on OCR (optical character recognition) of printed mathematical expressions. Construction of a representative corpus of technical and scientific documents containing expressions is discussed. A statistical investigation of the corpus is presented, and usefulness of this analysis is demonstrated in the related research problems, namely, (i) identification and segmentation of expression zones from the rest of the document, (ii) recognition of expression symbols, (iii) interpretation of expression strictures, and (iv) performance evaluation of a mathematical expression recognition system. Moreover, a groundtruthing format has been proposed to facilitate automatic evaluation of expression recognition techniques.
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream PascalFrancis, to step Corpus: 000416
- to stream PascalFrancis, to step Curation: 000371
- to stream PascalFrancis, to step Checkpoint: 000443
- to stream Main, to step Merge: 001473
- to stream Main, to step Curation: 001424
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">A corpus for OCR research on mathematical expressions</title>
<author><name sortKey="Garain, Utpal" sort="Garain, Utpal" uniqKey="Garain U" first="Utpal" last="Garain">Utpal Garain</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Computer Vision & Pattern Recognition Unit, Indian Statistical Institute, 203 B. T. Road</s1>
<s2>Calcutta-700 035</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Calcutta-700 035</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Chaudhuri, B B" sort="Chaudhuri, B B" uniqKey="Chaudhuri B" first="B. B." last="Chaudhuri">Bidyut Baran Chaudhuri</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Computer Vision & Pattern Recognition Unit, Indian Statistical Institute, 203 B. T. Road</s1>
<s2>Calcutta-700 035</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Calcutta-700 035</wicri:noRegion>
<placeName><settlement type="city">Calcutta</settlement>
<region type="province">Bengale-Occidental</region>
</placeName>
<orgName type="lab" n="5">Institut indien de statistiques</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">06-0054256</idno>
<date when="2005">2005</date>
<idno type="stanalyst">PASCAL 06-0054256 INIST</idno>
<idno type="RBID">Pascal:06-0054256</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000416</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000371</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000443</idno>
<idno type="wicri:doubleKey">1433-2833:2005:Garain U:a:corpus:for</idno>
<idno type="wicri:Area/Main/Merge">001473</idno>
<idno type="wicri:Area/Main/Curation">001424</idno>
<idno type="wicri:Area/Main/Exploration">001424</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">A corpus for OCR research on mathematical expressions</title>
<author><name sortKey="Garain, Utpal" sort="Garain, Utpal" uniqKey="Garain U" first="Utpal" last="Garain">Utpal Garain</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Computer Vision & Pattern Recognition Unit, Indian Statistical Institute, 203 B. T. Road</s1>
<s2>Calcutta-700 035</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Calcutta-700 035</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Chaudhuri, B B" sort="Chaudhuri, B B" uniqKey="Chaudhuri B" first="B. B." last="Chaudhuri">Bidyut Baran Chaudhuri</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Computer Vision & Pattern Recognition Unit, Indian Statistical Institute, 203 B. T. Road</s1>
<s2>Calcutta-700 035</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Calcutta-700 035</wicri:noRegion>
<placeName><settlement type="city">Calcutta</settlement>
<region type="province">Bengale-Occidental</region>
</placeName>
<orgName type="lab" n="5">Institut indien de statistiques</orgName>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">International journal on document analysis and recognition : (Print)</title>
<title level="j" type="abbreviated">Int. j. doc. anal. recognit. : (Print)</title>
<idno type="ISSN">1433-2833</idno>
<imprint><date when="2005">2005</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">International journal on document analysis and recognition : (Print)</title>
<title level="j" type="abbreviated">Int. j. doc. anal. recognit. : (Print)</title>
<idno type="ISSN">1433-2833</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Automatic recognition</term>
<term>Character recognition</term>
<term>Database</term>
<term>Expression evaluation</term>
<term>Mathematical formula</term>
<term>Optical character recognition</term>
<term>Performance evaluation</term>
<term>Probabilistic approach</term>
<term>Probability learning</term>
<term>Segmentation</term>
<term>Statistical analysis</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Reconnaissance caractère</term>
<term>Reconnaissance optique caractère</term>
<term>Reconnaissance automatique</term>
<term>Base donnée</term>
<term>Apprentissage probabilités</term>
<term>Formule mathématique</term>
<term>Evaluation performance</term>
<term>Evaluation expression</term>
<term>Analyse statistique</term>
<term>Approche probabiliste</term>
<term>Segmentation</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr"><term>Base de données</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">This paper is concerned with research on OCR (optical character recognition) of printed mathematical expressions. Construction of a representative corpus of technical and scientific documents containing expressions is discussed. A statistical investigation of the corpus is presented, and usefulness of this analysis is demonstrated in the related research problems, namely, (i) identification and segmentation of expression zones from the rest of the document, (ii) recognition of expression symbols, (iii) interpretation of expression strictures, and (iv) performance evaluation of a mathematical expression recognition system. Moreover, a groundtruthing format has been proposed to facilitate automatic evaluation of expression recognition techniques.</div>
</front>
</TEI>
<affiliations><list><country><li>Inde</li>
</country>
<region><li>Bengale-Occidental</li>
</region>
<settlement><li>Calcutta</li>
</settlement>
<orgName><li>Institut indien de statistiques</li>
</orgName>
</list>
<tree><country name="Inde"><noRegion><name sortKey="Garain, Utpal" sort="Garain, Utpal" uniqKey="Garain U" first="Utpal" last="Garain">Utpal Garain</name>
</noRegion>
<name sortKey="Chaudhuri, B B" sort="Chaudhuri, B B" uniqKey="Chaudhuri B" first="B. B." last="Chaudhuri">Bidyut Baran Chaudhuri</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001424 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001424 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= Pascal:06-0054256 |texte= A corpus for OCR research on mathematical expressions }}
This area was generated with Dilib version V0.6.32. |